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processor and a quer/ processor. 



© A aiftthod and apparatus, embodied m en Inte^ 
Hqent Page Store (10). for providing concurrent and 
consistent access to a Tunctionally separate transac- 
tion entity and a query entity to a siiarad fatabase.- 
«hile maintaining a single physical copy of mc«t erf 
the data. The IntelliQent Page Store (10) contains 
shared disk storage, and an Intelfigent version.ng 
mechanism allows simultanaous access tiy the 
transaction en-tity and the query entity to the shared 
data. The transaction entity Is presented the current 
data and the query entity is presented a recent and 
consistent version of the data. A single copV o« a« 
but recently updated pages is maintained by We 
intelligent Page Store (10). The query ^J'"''^ 
tlon entWes operate independently of each ottier and 
are separately optimized. 
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FIELD OF THE INVENTION 

This invention relates generally to simultaneous 
database transaction processing and query pro- 
cessing. 

DESCRIPTION OF THE PRIOR ART 

In recent years the demand lor datat>ase trans- 
action processing capacity in large installations has 
boon growing significantly. At. the same time, a 
laigo iractlon of new database applications have 
been in a relational database environment, which is 
iaiso an ideal environment for supporting ad hoc 
quor»©s on the database. This has given rise to a 
concomitant growth In the use of ad hoc unstruc- 
tured queries - a trend which is expected to accel- 
erate. Consequently, there is a growing require- 
ment lor simultaneously supporting both high vol- 
ume transaction processing and unstructured que- 
ries aoamst the same database. Therefore, a princi- 
pal obfoctive of this invention is to design an ar- 
chitecture that effectively supports both high vol- 
ume iransaciions and corfiplex queries, with mini- 
mal interference between the two, while sharing a 
Single copy of most of the data. 

Typically enterprises create and maintain their 
databases through a high volume of relatively sinri- 
ple transactions. 

Each transaction represents a well-understood 
business operation (creating a new customer 
record, rwting an account payment or transfer). 
Increasingly enterprises are becoming interested in 
running more ad hoc unstnjctured queries against 
their online data- This is stimulated by the feasibil- 
ity of writing these more complex queriea in SQL 
Typical applications might be: testing new market 
opportunities, decision support, detecting historical 
trends, profitability anaiysis eta. These unstnjo- 
tured queries are characterized by: 

o they are unplanned and not frequently used - 
performance tuning for each query is not 
practical 

o they do not modify the operational business 
data 

o they can execute against somewhat old 

database data without loss of value 
0 they may require large amounts of data scan- 
ning and processing - hence have long ex- 
ecution times compared with the standard 
transactions. 
In Chan, A.. Fox, S„ Lin W.T.. Nori. A, and 
Ries. D., "nme implementation of an Integrated 
Concurrency Control and Recovery Scheme". 
Proc- ACM SIGMOD Conf.. 1862. pp. 1S4-191. a 
versioning scheme is described. In this scheme, 
different versions of pages are chained, and again 
each version Is identified by the ID transaction that 



created It. Each query has associated with it a 
copy ot a Completed Transaction Ust (CTL) that is 
in effect at the time of its Initiation. Query access Is 
by chasing down the chain of physical pages till a 
s version In the queries CTL Is detected. Rrsl. this 
requires infomiation of completed transactions to 
be available to the query pnxessor. again prevent- 
ing transaction and query processing to be func- 
tionally separated Second, chasing down pages 
10 may require several i/Os- Third, In this scheme 
pages must be forced to disk by committtng trans- 
actions. Rnally, the scheme supports only page 
level locking for transactions. The scheme is gen- 
eralized to a distributed environment in Chan. A.. 
IS and Gray, R.. "Implementing Distributed Read-Only 
Transactions', IEEE Trans. Software Engrg., Vol. 
SE-11, No, 2, Feb. 1985. by using a complex 
scheme In which CTLs are sent between sites and 
are merged to create new versions of CTl^. 
20 In Robinson. J., Thomasian. A and Yu. P.. 
-Elimination of i-ock Contention In Ftelational 
Databases Accessed by Read-Only Queries and 
Cn-Une Update Transactions*. IBM Technical Dis- 
ckjsure Bulletin. Vol, 31. No. 1. pp. 180-185, June 
55 1988, an explicit page versioning method for que- 
ries and transactions that both access data by 
requesting locks at a common concurrency control- 
lar is described. The scheme requires knowledge 
of which pages are locked by transactions and 
30 queries, and when a lock contention ia detected* a 
version is created for the query to access. For 
queries this Is done by keeping status' arrays of 
queries In progress and checking these arrays 
when a transaction makes a lock request that re- 
36 suits m a conflict. The scheme also requires that 
committed updated pages by transactions be Im- 
mediately accessible by queries. Essentially, this 
scheme requires that queries and transactions be 
run under a single DBfsrtS (common concun-ency 
40 control manager and buHer manager), so that locks 
made by queries and transactions are known to 
each other. The disclosure does not describe how 
' garbage collection is is done to remove versloned 
pages that are no longer required.) 
45 The general difference from the prior art out- 

lined above and this invention is that in the prior 
art, queries and transactions are not mutually func- 
tionally separated. That is. in the prior art there is a 
Single concunrency control entity that ensures con- 
so sistent access among transactions and between 
transactions and queries. Ttiis precludes Indepen- 
dent Implementation end optimization of query and 
transaction processing. TTiere Is another extreme In 
the prior art that separates the data accessed by 
S5 queries and transactions by making a complete 
copy of the database. 

It is the object of the invention to overcome 
thse disadvantages. 
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This object is accomplished by the features of 
the main claims- Further advantages of the inven- 
tion are characterized In the subclaims. The Inven- 
tion relates, more particularly, to functionally sepa- 
rating a database transaction entity and a query 
entity that access the same daia by providing a 
method and apparatus called an intelligent page 
store that provides two access paths, one for the 
transaction entity and one for the query entity, and 
an Implicit versloning nnechenism, that provide the 
transaction entity with the most recent data and 
provide the query entity with a recent but consla- 
tont version of the data, while physically maiintoin- 
ing a single copy of most of the data. 

SUMMARY OF THE INVEMTION 

In accordance with a preferred but nonetheless 
illustrative embodin^eni demonstrating objects and 
features of the present invention there is provided a 
novel method and apparatus for simultaneous 
database transaction processing and query pro- 
cessing wherein an intelligent page store contain- 
ing shared disk storage is provided. The intelligent 
page store provides two access paths to the 
shared data, on© by a transaction entity and one by 
a query entity. In the Intelligent page store an 
impficit versioning mechanism allows simultaneous 
access by the transaction entity and the query 
entity to the shared disk storage, where the trans- 
action entity is presented the current data and 
wherein the query entity is presented a recent and 
consistent version of the data. Furthermore, a sin- 
gle copy of all but recently up-dated pages Is 
maintained by the intelligent page store, and the 
query and transaction entities operate independent- 
ly of each other. 

As relational database queries become more 
complex, parallel Intra-query processing, which ex- 
ploits a large number of processors cooperaUng on 
the same query, has become Important as a means 
of improving query response times, and providing 
incremental growth- On the other hand, transaction 
processing is, for the most part, not amenable to 
Intra-transaction parallelism, but requires the sup- 
port of a large number of concurrent transactions 
with fiul)-second response limes- Reducing data 
contention by shortening lock hold times becomes 
critical as the transaction rate increases. This fa- 
vors large processors with shared buffers. There- 
fore, a principal Objective of this invention 1$ to 
provide a logical database with two paths for ac- 
cessing data: one for database transacBons, and 
another for adhoc queries. This allows the transac- 
tion and query processing systems to be Indepen- 
dently optimized, white providing access to the 
same data. For instance, update and transaction 
traffic can exploit the performance of large proces- 



sors in tightly coupled shared memory configura- 
tions, while complex queries against the same data 
can be handled; by parallel database software on 
loosely coupled micro-processors. 
5 In the environment that supports transactions 

and quertes with the above characteristics, further 
objectives of this invention are as fbllows: 

o Disks and disk controllers are a significant 
component of total cost when a large 
to database Is present. Thus the disk space for 

combined transaction and query processing 
should be minimized. Thus, an objective of 
this Invention is that online data should be 
shared by transactions and queries. 
75 o Complex queries will sometimes have exocu- 
tton times, and took holds which are signifi- 
cantly longer than the response time of the 
"structured transaction" for which throughput 
Is important. Therefore, yet another objective 
ao of this Invention is that the complex queries 

should see a consistent view of the datat>ase 
data without withholding locks from the trans- 
. action traffic. 

o It is still another objective that the transaction 
25 processing database software and the query 

processing DB software should be effectively 
decoupled (no access to each others buffers 
in memory or exchange of lock infomnatlon). 
In general this allows software for transaction 
30 and query processing to be Independently 

optimized. 

These, and other, objects, advantages, and fea- 
tures of the invention will be more apparent from 
the following description and the appended draw- 
ss Ings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 Is a schematic diagram describing the 
relevant system structure for prior art 
transaction and query systems; 
Fig. 2 is a schematic diagram of the system 
structure for the preferred embodi- 
ment of the Invention: an Intelligent . 
page store for concurrent and consis- 
tent access to data by transactions 
and queries; 

Rg. 3 is a state-time diagram showing when 
database snapshots are created, what 
It means for them to be consistent 
and what it means to maintain a query 
version; 

Fig. 4 is a state-time diagram showing how 
the numtier of copies of each page In 
the page store varies over time as a 
result of write requests from the trans- 
action processor and -new query ver- 
sion creation; 
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Fig. 5 denne$ the algoftthms and logic used 
by tJie intelligent page store In the 
preferred embodiment to implement 
implicn versioning, 

DESOniPTION OF T«E PRfiFERRED EMBODI- 
MENT 

PRIOR ART TBANSACTION AND QJERY SYS- 
TEMS 

To simplify the description of our invention, we 
6umTnari2e the essential features of prior art sys- 
tems for transaction and query processing with a 
schematic in Figure 1. 

In this figure, box 1 shows a set of Transac- 
lions X1.X2,-.Xm and Quones Ql,Q2,...Qn execut- 
ing against the database, hi our environment many 
transactions may need to be executing concur- 
rently against the data to provide adequate 
throughput. However the view of data seen by 
these programs is that they execute serlaily, 
omically and with- out Interlerence. 

A Transaction Processor 3 provides the con- 
currency control. locking, data access checking, 
index management, buffering and data protection 
needed to satisfy the view of data expected by 
Transactions 1, In most prior systems, update 
transactions and read only qiieries are not sepa- 
rated out and handled independently. The Transac- 
tion Processor Is typi-caliy tuned to support the 
required throughput of Transactions. Some concur- 
rent queries are also supported on this Transaction 
Processor but difficulties arise when a high 
throughput of update transactions is required in the 
presence of concurrent much longer running read- 
only queries. 

Interaction between Transactions 1 and the 
Transaction Processor 3 consists of data access 
requests to read and write database records and 
Information on when to commit or abort transac- 
tons. This is illustrated as Interface 2. 

The word processor In the Transaction Pro- 
cessor Is not meant to Imply any particular system 
organization, machine packaging or physical unit 
boundaries. The Transaction Processor function 
could execute on a single physical processor, mul- 
tiprocessor network of clustered processors or as a 
component sharing a processing system with other 
functions. Wo freely use processor ^n this way from 
now on. 

An important component of the Transaction 
Processor Is the Buffer 4. This is a pooi of fast 
access storage (such as electronic memory) man- 
aged by the Transaction Processor. To read or 
modify database data, pages must be read Into the 
Buffer from pages of TransacUon Database Data 8 
atored on a non-volatile medium (such as magnetic 



disks). Updates are made to pages in the Butter 
which must eventually be written back to the non- 
volatile Datat^ase Data storage. • 

Transaction processing also makes use of a 

9 . Transaction Database Log 6 to protect the 

database data from transaction aborts and system 
failures. The Transaction Database Log must be 
stored on non-volatile storage (such as magnetic 
disks). Standard termlnoiogy and algorithms de- 

10 scribing what the transaction processor should 
Yrtite into the log and how it should be coordinated 
with buffer management are descril)ed in C 
WIOHAN, D. HAOERLE. B UNDSAY, H PIRAHESH. 
PSCHWARZ. -ARIES: A TRANSACTION METH- 
OD SUPPORTING FINE GRANULARITY. LOCKING 
AND PARTIAL ROLLBACKS USING WRITE 
AHEAD LOCKING", IBM Research Report RJ 6849 
1/23/89, 

and 

20 BJK. CRUZ. "DATA RECOVERY IN IBM 
DATABASE 2". IBM Systems Journal, Vol 23 no 2 
1^84. 

The Transaction Database Log 6 and Transac- 
tion Database Data 8 must both be stored on non- 
25 volatile storage. This is usually provided using 
magnetic disks but any comparable storage me- 
dium would suffice. The Transaction Processor Is- 
sues streams of page read and write requests for 
the tog via Interface 5 and for the Database Data 
30 via Interface 7. The content of these page read and 
write commands Is defined by the Transaction Pro- 
cessor, The basic requirement to provide the non- 
volatile storage services required by Interfaces 5 
and 7 Is that any page once written can have Its 
35 contents exactly retrieved by a read request at any 
later time. 

SYSTEM STRUCTURE FOR INTELLIGENT PAGE 
STORE METHOD AND APPARATUS 

40 

Referring now to FIG. 2, there is shown a block 
diagram of the system structure for our Invention: 
an intelligent page store scheme to allow concur- 
rent processing of queries arKl transaclions- 

45 It consists of a Transaction Processor 3. a 

Query Processor 17, and an intelligent Page Store 
10. A fundamental difference between this and the 
prior art transaction and query systems is that 
update Transactions are separated out from Que- 

50 ries and sensed independently by a Transaction 
Processor and a Query Processor. The Intelligent 
Page Store provides the Transaction and Query 
Processors with access to shared physical pages 
of the database in a way which supports Im- por- 

55 tant performance requirements (concurrent transac- 
tion and query processing) at minimal cost In the 
amount of non- volatile storage required. 

The set of Transactions 9 supported by the 
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Transaction Processor contains only update tran$' . 
aciions end read-only transactions which require 
access to current dala from the database. This Is 
represented by the subset X1,X2..«.Xm of the 
TrensacUons And Queries 1 handled by the Trans- 
action Processor 3 in Figure 1. Transactions see 
the database data with the same view as in prior 
art - as atomic, non-Interfering, serialized actions- 
Thus is illustrated by the fact that they interact with 
the Transaction Processor 3 via the same Interface 
2 of record access requests and conwnit / abort 
information. 

The function of the Transaction Processor 3 
and its Buffer 4 in this figure are the same as in 
Figure 1 . Together they provide, buffering lockinQp 
concurrency control. Index management services to 
support the view of data expected by Transactions 
X1,X2,„J<m- The Transaction Processor continues 
to write Its recovery log with a stream of page read 
and write requests via Interface 5 just as it would if 
the recovery log were stored directly on non-vola^ 
tile storage attached to It It also reads and writes 
datal>ase data pages via Cnterfaca 7 exactly as in 
prior art. 

The Query Processor 17 supports read-only 
queries which can accept data which Is not com- 
pletely current provided that it ts consistent. The 
set of Queries 15 is shown with example queries 
Ql.Q2,-.0n which were in prior art sewed by the 
Transaction Processor. Queries Ql,Q2..,.Qn have 
explicitly been identified and separated from Trans- 
actions X1,X2,... Xm so that they can be processed 
independently. Queries received via this separate 
interface are processed with the assumption that 
they are read-only and do not need completely 
current data. The separate Interface, a stream of 
record read accesses, known to contain only re- 
quests from queries is shown as Interface 16. 

The Query Processor 17 must provide, access 
checking, index management, buffering etc. for 
queries. It is possible for the same software and 
hardware processing to be used for Query Proces- 
sor 17 as was used for Transaction Processor 3 In 
prior art where it also served queries. However 
since most existing database systems are driven 
by the requirement to support adequate transaction 
throughput, having query processing handled by a 
separate entity allows retunlng or redesign of the 
software and hardware processing to support read 
only queries alone. Since the requirements for con- 
cun^ency control, locking and data protection In a 
query only system arc substantially relaxed relative 
to general purpose transaction processing, consid- 
erable performance and cost performance gains 
become possible. 

The Query Processor 17 will include some 
internal page buffering, but since the buffer rr^- 
egement scheme may differ from that used in 



Buffer 4, buffering Is not explicitly identified as a 
Subcomponent 

Interface 18 enables the Query Processor 17 to 
request pages of database data via a stream of 
5 page read requests. The data supplied presents a 
consistent view of the database data from some 
recent time. 

The Intelligent Page Store 10 is a new concept 
which makes the separated processing of transac- 
10 tions and queries feasible without great cost In nor»- 
Yolatile storage (Independent copies of aU the 
database data for use by transaction and query 
processing respectively). 

The Intelligent Page Store contains a Process- 
f0 ing part 11 and a Non-Volatil© Storage part 12. The 
Processing in the Intelligent Page Store consists of 
a Version Manager 13, which handles the page 
read and write requests from the Transaction and 
Query Processors via Interfaces 5.7.18. The Non- 
20 Volatile Storage In the Intelligent Page Store acts 
as a repository for the Tnansactlon Database Log 6 
and the Transaction Database Data S. This Trans- 
action Database Data is exactly that shown as 8 in 
Figure 1. i.e. It is the backing storage for the 
as current copy of ©very page which the Transaction 
Processor 8 has written to rton-Volatile Storage. 
The Intelligent Page Store provides additional Non^ 
Volatile Storage for pages of Query Version Data 
14. These allow a consistent query view of the 
30 database to be presented to th© Query Processor 
via requests in interface 18. 

The function of the Version Manager 13 Is to 
control access to shared logical pages of data by 
ih& Transaction Processor and Query Processor 
35 while preventing them from affecting each other's 
performance significantly, and minimizing total re- 
quirements for physical non-volatile storage. As a 
result the total non-volatile storage needed to save 
the Trans-action Database Data 8 and the Query 
40 Version Data 14 is considerably less than twice the 
amount vmicb would have been needed for Trans- 
action Database Data In the prior art. 

The Version Manager 13. responds to page 
read requests from the Query Processor so as to 
45 present a recent consistent version of datat)ase 
data via Interface 18. and at the same time to 
appear as simple non-volatile storage In response 
to requests from the Transaction Processor via 
interfaces 5 and 7. 
so in order to prevent long running complex que- 

ries from locking out transaction updates on data 
read by the queries while preserving consistent 
accese, queries see a consistent query snap-shot 
of the data: transaction updates are made to a 
66 logically separate transaction version. However, in 
order to share most of the same pages between 
the transaction and query versions of the data all 
corresponding logical pages of the transaction and 
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query version data are $upported from a single 
page of physical storage for pages which have 
been updated by a transaction since the query 
VGr^Ion was created by a process called database 
snapshot. We refer to this method of using shared o 
physical pages to support independent transaction 
and query views as (nnplicit versloning. The 
method is descrit>ed In more detail using Figure 3 
and Rgure 4. 

The intelligent Page Store also includes a to 
mechanism for determining when a new time step 
should be taken. This can be based on an internal 
algorithm or an external prompt from users, the 
Transaction Processor or the Query Processor. 

To implemerrt impncll versioning, the Version is 
Manager 13 is . responsible for routing page ac- 
cesses from both transaction and query processing 
to the correct physical pages, for maintaining and 
managing versions of pages, for initiating and pro- 
cessing the creation of new query snapshots and 20 
for recovenng and reusing physical storage from 
old page versions. 

The Transaction Database Log 6 saved on 
Non-Volatile Storage 12 in the Jnteliigent Page 
Store 10, is exax:tiy the Information which the 2$ 
Transaction Processor 3 needs to save on non- 
volatile storage to protect database data against 
transaction aborts and system faliures. Ttie Version 
Manager 13 in the Intelligent Page Store responds 
to read and write requests in the log Interface 5 00 
saving the information in the Transaction Database 
Log 6 on the Intelligent Page Store Non-Volatile 
Storage and returning it to subsequent read re- 
quests. The advantage of having the Database Lo^ 
in the Intelligent Page Store is that the log informa- 
tion can t>e used to create consistent versions of 
database data without disturbing the Transaction 
Processor or reducing its throughput. Since a log is 
maintained only for update transactions and the 
Query Processor needs no access to transaction 40 
log information, ttie amount of non-volatile storage 
requrred to store the Database Log in the Page 
Store is the same as If the log were directly at- 
tached to the Transaction Processor as m prior art. 

Similarly the Version Manager responds to 4S 
Transaction Database Data page read and write 
requests in Interface 7 by saving page images Irt 
the Intelfigent Pago Store Non-Voiatite Storage for 
Transaction Datat^e Data 8 and returning them 
on subsequent read requests. This enables the so 
Version Manager to create an additional physfcaf 
copy of pages In Query Version Data 14 to support 
a consistent view of the database for queries only tf 
some transaction has modified the page since the 
query version was created by the database snap- ss 
shot processing. Since non-volatile storage Is a 
significant component of the cost of database in- 
stallations, avoiding unnecessary separate physical 
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copies of pages for the Transaction and Query 
Processors is important Implicit versioning, de- 
scribed in Figure 3. and Figure 4 includes an 
efficient scheme for determining when a copy ol a 
database data page must be made to meet the 
requirement of presenting a consistent view to que- 
ries and the appearance of a non-volatile medium 
to transaction processing. This enables queries and 
transactions to execute concurrently without unnec- 
essary replication of the datat>ase pages and ttence 
at minimal cosL 

The Non-Volatile Storage 12 In the Intelligent 
Page Store 10 can be Implemented with any stan- 
dard non-volatile medium (such as magnetic disks) 
for storing tt>e Transaction Database Log 6. Trans* 
action Database Data 8 and Query Version Data 
14. 

IMPLiar VERSIONING: MANAGEMENT OP QUE- 
RY VERSIONS 

Figure .3 is a state diagram defining implidt 
versioning by showing the relationship between 
datat>ase states, query snapshots,, transactors snd 
queries as time progresses. The diagram Is not to 
scale in the sense in tiie sense that queries and 
query versions may have lifetimes which may be 
100 tinnes or 1000 times longer than typical trans- 
action lifetimes. 

We describe Implidt versioning for the case 
where transaction access to current data and one 
query snapshot are supported. A straightforward 
extension of the scheme can reduce disruption 
when new snapshots are created by allowing addl- 
tionef query versions at the cost of additional r)on- 
volatile storage. 

Rgure 3 Is a state-time diagram with time 
advancing from left to right Events directly above 
each other are simultaneous. 

Referring to Rgure 3, the top section shows 
Lifetimes 19;20,21 ,22,23.24,25 of sample update 
transactions X1,X2,..JC7 respectively executing on 
the Transaction Processor. The left end of each of 
these boxes shows when an update transaction 
begins execution and starts holding locl<d and 
database resources. The right hand end shows 
when it ends and commits Its updates In (he 
database. Notice that transactions are overlapped 
(execute concurrently) and are committed in an 
order which Is not necessarily their order of start- 
Ing, 

The next section of Rgure 3 shows states Of 
the database of the database as It evolves In time 
as a result of update transactions X1.X2.„.X7 t>eing 
applied. The initial Database State 26 represents 
the state of the datafc)ase at some start-ing time. 
Subsequent states of the database 
27,28,29.30.31,32,33 Show the changed states after 
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trans-^actionS X1,X2,..-5Cr commit m order. N^lce 
thai the trans-action database state advan^s in 
small granular steps each containing the updat^ ot 
one tranaac«on. Th^ order In which tran^ o^s 
commit XI before XE before X3. eic w.H to 
exactly reflected in the Transaction Details© Log 6 
generated by the Transaction Processor 3. In ^ese 
state diagrarria. all updates from preceding trans- 
actions are included and no updated ^^^["^"fr: 
ing transactions are Included. Fdr example s^te 30 
-database state after transaction X4" Includes the 
updates from XLX2P<3 and X4 but updates 
from XS.xe or X7. We us© the term consistent to 
define this pn^perty Ot a database state. 

The next secflon of Rgure 3 including 
34 35,36.37 shows Query Versions being created 
by' taking a snapshot of database slate and main- 
tained for use by queries. 

In action 34 the Version Manager takes a snap- 
shot of the initial Database State 26 to create 
Ouery Version VO whose lifetime Is ehown by 39- 
The length of 36 shows exactly the lifetime dunng 
which query version VO is available for use by 
queries running on the Query Processor. Action 35 
shows the Version Manager at some time after 
transaction X3 has committed, talcing a snapshot of 
the consistent database state 29 end using it to 
create Query Version VL The life-time during 
which Query Version VI is available to queries Is 
shown by the exact length of 37. 

The next section of Figure 3 shows sample 
query lifetimes: the period during which Q1 and 
are being processed are shown active Lifetimes 39 
and 39 respectively. Throughout this time Q1 and 
Q2 must have access to the data of "Query Ver- 
sion VO- 36. Queries Q3 and Q4 have Ufet'imes 40, 
41 respectively; these queries must have access to 
the data of "Query Version VI " 37. 

. The implicit versioning scheme implemented 
by the Version Manager 11 has to deal with the 
fact that the transacfion processor 3 will wrHe out 
pages of data via Interface 7 in a way which 
optimizes the use of its Buffer 4. In particular 
pages will be written out including uncommitted 
data and without any guarantee that the set of 
pages written out represent a consistent state of 
the transaction database in the sense of database 
States 26,27,28.29.30.31,32,33. The Intelligent 
P^ge Store, has to receive these pages and return 
them on subsequent read requests, without disturb- 
ing the view of data seen by longer running queries 
In the Query Processor; these must see the Query 
Versions 36,37. In implied versioning. this is 
achieved by advancing the version of data seen by 
the queries in discrete time $teps. At each time 
stem a new query snapshot is created and made 
visible to queries. Each snapshot is a consistent 
vi«w of committed database data at some recent 



. time Between time Steps, the Intelllgerrt Page 
Store presents a constant view of the data to 
queries. When the Transaction Processor writes out 
updated pages to the Intelligent Page Store, the 
s updated pages are saved but not made visible to 
the queries until the next time step. 

ItAPLlClT VEBSIONING: MANAGEMEWT OF 
PAGE COPIES 
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Implicit versioning also delemiines accurately 
when an additional copy of a database data page is 
required to support the current transactional and 
query views. This enables non-volatile storage re- 
ia qulrements to be minimized. 

The management of page copies by implicit 
versioning is described by a time state diagram in 
Figure 4. Time advances fmm left to right In this 
diagram. Events directly above each other are si- 

20 muHaneous. , ^ ^i,^ 

The initial state of the database data l.e. the 
combined Transaction Database Data 8 and the 
Query Version Data 14. in the Intelligent Page 
Store 10 Is shown aa 47. This is ao-tually a com- 
as posite state represented by a set of logical pages 
Pi , P2. P7 wWch store a set of page values. We 
represent the values stored In these pages by 
letters , . 

"a",*T>",*c',''d","e",''f''."g" respectively. This 
30 database state is assumed to be a consistent state 
of the transaction database and the actual state of 
transaction data, as would occur If transaction pro- 
cessing had just restarted after being quiesced. 
State 26 In in Figure 3 illustrated this case. We 
33 assume that a snapshot of this consistent state has 
been taken as shown in Action 34 and a_^Query 
Version created corresponding to it as in "Quory 

Version VO"* 36. 

A key point is that in this State 47 no page is 
40 stored twice; a single copy of each of pages 
P1.P2...P7 is adequate to support con-ect transac- 
tion and query views of the data- base data. This 
set of physical pages is acting as Trans- action 
Database Data 8; no additional physical storage is 
4S required for Query Version Data 14 at this time. 

A sequence off page read and write request 
Actions 42,43.44.45 from the Transaction Processor 
via Interface 7 affect the page data stored in the 
Intelligent Page Store. Action 46 shows the effect 
Sfl of subsequently creating a new query version by 
taking a snapshot of the database. The sequence 
of states of data storage In resulting from these 
operations is shown as States 48.49.50,51.52. 

Action 42 is a request from the Transaction 
55 Processor to read the value of page P5, n receives 
-e" the cun-ent value in the page store. The values 
stored for pages P1.P2....P7 have not been 
changed and no copies have been made. 
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. The next Action 43 Is a request from the Trans- 
action Processor to write the value "x" into page 
PZ. State 40 shows that a copy of page P3 Is made 
so that both the old value "C and the new value 
"x" can ho saved. At this point queries will see the s 
old value residing In Query Version Data 14, 
whereas Transaction Processor requests to non- 
volatile storage will see the new value "x" in a 
copy of th© page in Transaction Database Data. 

The Action 44 is a request from the Trans- to 
action Processor to read page P3. This will receive 
"X* the value of the page written there previously 
at the request of the Transaction Processor. Slate 
50, the state of the page store after Action 44, 
shows that the state of the page store is not is 
changed as a result of this read operation. 

Action 4S is a Subsequent write to page P3 
from the Transaction Processor requests that value 
"y- be written. State 51 shows that the new value 
"y" is wrrtten over the old transaction value "x" but 20 
that no new copy of the p&ge Is made since the 
value needed by queries in this time step namely 
"c" is already saved in a copy. 

When a new query version is created by taking 
a snapshot of the database in Action 46. the old 2s 
query value "c" for page P3 can be discarded and 
the non-volatile storage used for this copy recov- 
ered for reuse as is illustrated by State 52. Assum- 
ing that the transaction which was responsible for 
writing the value 'y" Into page P3 has committed so 
before the "snapshot" Is made, then this Is the 
value which will be made available to queries in the 
next timestep. 

Read requests from the Query Processor in the 
first query time Interval will see values as 
"a"."b","c".''d"."e","f"/g" for pages P1,P2....P7 
regardless of when the requests are made in thfs 
time interval as is shown by "VO READS from 
QUERY PROCESSOR" 53. Read requests from the 
following in-terval will see the new values 40 
"a'',''b","y-,'*d'',"e".'^''.*g" in for logical pages 
P1,P2»..P7 as is shown by *'V1 READS fn>m QUE- 
RY PROCESSOR*' 54. 

Note that by state 52, physical storage used for 
Query Version Data 14 is. recovered for reuse by 4$ 
the next query version. 

Figure 4 is logical: current databases often 
contain more than 1D,000 data pages raifier than 
the seven used in this description by example. 

50 

LOCUC IN VERSION MANAGER IMPLEMENTING 
IMPUCIT VERSIONING 

Figure S shows the logic in the Version Man- 
ager 13 used to implemeni implicit versioning. It Is ss 
a control flow graph. 

The data structures in the Version Manager 
consist of: 
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page to file map 

This maps page numbers to the location in the 
file system where the primary copy of that logi- 
cal page is stored. At times when the transaction 
database state is identical to the current Query 
Version <e.g. state 47 in Figure 4) the Transac- 
tion Database Data 8 will consist of the prinnary 
copy of each logical page in the database, 
copy flag vector 

This maintains one bit of information for each 
logical page in the database indicating whether 
an auxiliary copy has been made as part of 
Query Version Data 14. 
copy index 

This locates where the auxiliary page is in work 
storage. H may have the iorm of a B-Tree. 

Referring now to the flowchart in Figure 5. The 
Input Events 55 to the Version Manager are a 
"request to read a fwge* 5S, or "write a page" 57. 
from the Transaction Processor, to "read a page 
fmm the query processor" 68 or to •create a new 
query version" 59. 

On receiving Event 56, "request from the 
Transaction Processor to read a page" with the 
number of the page to read and a return page 
value expected. Action 60 looks up tf^ location of 
this page using the page to file map. Action 61 
obtains the page value by reacilng from the file 
system at this location and Action 62 returns this 
value in response to the request 

On receiving Event 57, "Page Write from the 
Transaction Processor" with the page numtjer to 
write and value to be written, the copy flag vector 
is checked in Action 63 to detenmine wttether an 
auxiliary copy of the page has already been made 
with the data needed by the cunrent query version. 
If not Action 64 allocates space for a query version 
copy out of a vrork space of page frames, copies 
the query value of the page previously stored jn 
the primary copy to this allocated space, updates 
the copy Index to point to the copy and sets the 
flag in the copy flag vector corresponding to this 
logical page to indicate that a copy has been 
made. 

From this point processing is the same whether 
or not a query copy had to be made. In Action 65 
the location of the primary copy is determined by 
looking up the requested page numt>er In Ifie page 
to file map. the new value is written over any 
previous value in Action 66 and the request is then 
completed by Action 67. 

Our preferred embodiment given present query 
and transaction system characteristics Is that query 
versions are made "out of position" and the pri- 
mary copy used for the most recent transaction 
value. This choice is not essential to the implicit 
versioning concept. 
• On receiving Event 58, a "Page read request 
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from the Query Processor" with a page number to 
read and a returned value expected, Action 68 
checks the copy vector is checked to determine 
whether a query copy of the page has already 
b^en made. If it has, the copy index is used to 5 
locate this copy in Action 72 and the value read 
from this location by Action 73. If not. Action 69 
uses the page to file map to locate the primary 
copy and Action 70 obtains the page value by 
reading from this location in the file system. Action to 
71 retums to the requester the resulting value from 
whichever of these paths Is tal«en. 

When a new query version is to be made. 
Event 59, assuming that the set of primary pages 
represents a oonsistent state of the transaction i6 
database, the auxIRary pages are freed up» and 
their contents discarded in Action 74. Then Action 
75 resets the copy index and copy flag vector to 
indicate that no auxiliary query version pages cur- 
rently exist. Rnally Action 76 denotes the comple- 
tion of processing for this event. 

Methods for ensuring that the set of primary 
pages in the page store represent a consistent 
state of the transaction base are described below. 

METHODd FOR FORCING A CONSISTENT 
TRANSACTION DATABASE STATE TO THE 
PAGE STORE 

One method for forcing a transaction consistent 
snapshot of the database to disk is to get all 
transactions on the Transaction Processor to a con- 
sistent state by quiescing them or having them halt 
immediately at a point where they have left the 
database In a cortslstent state and then force the. 
Buffer 4. 

If the transaction processing entity provides a 
bound on the length of time which e modified 
committed page can remain In memory without 
being flushed to disk then the preferred embodi- 
ment (with least disruption to transaction process- 
ing) Is the algorithm for time step without distur- 
bance to transaction traffic described bafow. 

DETAILS OF TIME STEP WITH GUARANTEED 4a 
PAGEOUT 

This section describes in more detail the time 
step algorithm for a database which guarantees 
that any modified page will be written out before so 
limG T (after it is modified in the buffer and the 
transaction commits). 

o For time t, define A(t) to be the minimum of 
the starting times of all transactions acttve at 
time t- A(t) is always before or equal to t. ss 
o Let LSN_t denote the log offset reached at 
time t Because all transaction activity earlier 
than A<t) has been resolved, by time t. for a 



page uixlate in the log at LSN s where 

LSN_5 &AI. LSN_A(t), ft Is determinable by 
inspection of the log interval (LAN_$. 
LSN_t) whether this update was committed 
or aborted. 

o We give a precise definition of what it means 
for a 

transaction processing entity to guarantee 
pageoift within time T. For any page up-* 
date which commits at time s then a version 
of the page Including this update (and possi- 
bly later oporatfons) will be flushed to disk by 
time s + T: tor any page update which aborts 
at time s and has allowed an incorrect ver- 
sion of the page to be flushed to disk, then a 
version of the page with the update undone 
(and possibly later operations overwritten) will 
be forced to disk by time s ••-T. 

o For a transaction processing entity which 
guarantees pageout within time T, and a gh/- 
en time TS, then every page update which 
precedes A(TS-T) will be reflected "on disk" 
by time TS ( t)ecause the transaction respon- 
sible commits or atx)rts l>y time TS^T and the 
result of its operations are sure to be flushed 
to disk at least time T later). 

o To take a query pnDcessing time step at time 
TS the following actions are needed: 

1 , Instantly after time TS, any action of trans- 
action processing to write forces a prequery 
copy of the page to be made with the image 
that existed at time ts. 

2, Times A(TS) and A(TS-T) are determined. 
This can be done (In the Intelligent Page 
Store) by starting from the transaction table 
written out in a checlcpoint record in the log 
and running forward through the log noting 
the start and end of new transactions. 

3, By processing the log interval (LSN__A- 
(TS-T), LSN_TS) against the image of the 
database in the Intelligent Page Store at time 
TS is is possible to construct a correct snap- 
shot of the database reflecting all transactions 
which had committed by time A(TS), 

4, Note that every update preceding A(TS-p 
has either committed or aborted, and this 
corresponding effect or undo of this update Is 
reflected in the pre-query Image In the Intel- 
ligent Page Store. 

5, We make a forward pass through ifie log 
from LSr^_A(TS'T) to LSN_TS. For every 
page update before lSn_a(TS), it can be 
determir>ed whether the owning transaction 
committed before LSN_A(T$): no page up- 
date in the interval (LSN_A(TS). LSN__TS) 
can have committed by A(TS). Further more 
by comparing the log position of the update 
record with the LSN in the pre-query page. 
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vye can determine whether any update which 
commlis before A(TS) is reflected there. 
Since w© meet such updates in order 
(moving forward through the I09). if they are 
not in the pre-query copy of the page, they 
can be applied. The algorithm for processing 
page updates from the log is as follows: 

for each page update < going fbrwaitl through the 

log) 

if transaction commit before A{TS) 

If LSN(update) &Ar. LSN(preQDB page) then 
apply the update 

- $inco wc started in the log at LSN ^A(TS-T) 

- this must be the ''next update in sequence" 
else LSN(update) iess or equal LSN (preQDfi 
page) then skip 

- this update already reflected in preQDB copy 
else transacbon at>xted or committed after A(TS) 

(all updates later than A(TS) in this category) 
if LSN(update) &Ar. LSlsKpreQDB page) then 

slcip 

- the update was not flushed out to preODB 
else LSN(updaie) = LSN(preQDB page) 
undo this update and any previous 8tacl<ed 

undos for the page 

this update made it to the page store but no 
later 

- version of the page did; undo the update then 
undo 

- any previously stacked updates for the page 
in reverse 

- order 

else t^N(update) &AI. LSN (preQDB page) 

stack the undo to be applied later when all 
undos can be applied In reverse order to the 
preQDB page available 

6. On completion of this log pass the preQDB 
pages reflect the database image for all transac- 
tions which committed by A(TS) 

o K there Is not room in memory to manage 
a etack of undo operations on pages this 
can be handled by marking and chaining 
fog records and making a second reverse 
pass through the log from LSN TS to- 
wards LSN_A(TS-T). 
0 Whenever a preQDB page is modified dur* 
ing the k>g processing, this (s done with 
the technique for saving the Image of the 
page described in the "Suspend and 
Flush** algorithm. Formally: 
tf there Is a preQDB copy already 
-separate from the ADB version of the 
. page then modify the eTcisting preQPB 
page 

eise create a preQDB version 
— with the time TS image 
apply the undo or redo operation to this 
o There is no reason for the pageout guar- 



antee time to be the same at all times. 
After we select time TS for the time step, 
all that Is necessary is to determine a time 
such that the effect of all page updates 

9 preceding that time has been resolved and 
forced to disk by TS. For a database 
which maintains a Dirty Page Table (DPT) 
of pages which have been modified in 
memory but not paged out, A(min(DPT- 

10 (TS))) will have this property. 

o In describing the time step algorithm we 
have not discussed failure and restart of 
the transaction processing entity. The sim- 
plest general approach (and our preferred 

IS embodiment) is that if the transaction pro- 

cessing entity falls. It is recovered before 
tfie next query tinne step is taken. Inter-* 
leaving of time step advance with transac- 
tion system recovery is feasible but sen- 

isa sitive to the particular recovery strategy 

used by the transaction system. 

The guarantee of pageout would be 
tiTO time T within which every resolved 
update would be flusfied to disk given that 

as no transaction prorosslng system fail- 

ure occurred. 

OWNERSHIP OF DATABASE AND LOG STOR- 
AGE 

30 

In the above we have assumed that the Intel* 
ligent Page Store has ownership of the storage 
used for database pages and the log from transac- 
tion processing. This ensures that the intelligent 

3$ Page Store has free access to this Information. 

Computation of quantities such as A(t) is most 
easily done by having the intelligent Page Store 
read the Active Transaction Table from checkpoints 
in the log and note from the log stream subsequent 

40 transaction begin and end operations. 

Similarly, the simplest way to compute T is to 
read the dirty page table form a log checkpoint 
recordi and if more accuracy re required, advance 
It in time by checking the store for subsequently 

45 flushed pages. 

AVOIDING DISRUPTION TO QUERY PROCESS- 
ING DURING TIME STEP ADVANCE 

50 As described atxive. all queries must execute 
against a valid query version of iha data. Therefore, 
when a new snapshot Is being created no queries 

Should be running. This disruption Of query pro- 
cessing can be avoided at the cost of at most one 
ss additional copy of each modified page. One can 
then allow queries running against the last query 
version (A) to continue^ while new queries see the 
new query version <B). A further query version (C) 

10 
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is not created until af] queries accessing version A 
hav© cornpleted, and version A has been deleted. 

SHARING INDEXES AND METADATA BETWEEN 5. 
TftANSACnON AND QUERY PROCESSORS s 

The Intelligent P»ge Store has been described 
in terms of versloning the relational table data. The 
same technique win >vork without change to provide 
the Query Processor with access to Indexes which ? o 
have been built by the Transa.ctlon Processor cata- 
log and other metadata. The implicit vgrsignlng will 6. 
ensure that index and metadata pages made avail- 
able in this way to the Query Processor is exactly 
synchronized and consistent with the query table ^6 
cJata. 

Claims 



one consistent snapshot version of said 
datat>ase. 

A database system as defined In Claim i. 
wherein said transaction processor iiicludes a 
page buffer providing fast access storage for 
the most recent version of some of the pages 
of said database, said intelligent page store not 
necessarily containing a copy of all of said 
most recent ver^on pages in said page buffer. 

A database system as defined in Claim 5, 
wherein said intelligent page store creates a 
consistent snapshot version of said database 
by flushing said page buffer Into said Intel- 
ligent page store, said primary version of said 
database after said flushing being a consistent 
snapshot version. 



1. A database transaction and query processing 
system, comprising: 

an inlelligent page store for non-volatile stor- 
age of a primary version of each page of a 
database and for creating and maintaining at 
least one consistent snapshot version of said 
datat>a5e for access by queries which do not 
require access to the most recently updated 
datat>ase pages, said intelligent page store 
maintaining only one physical copy of any 
page of said database which is the same in 
said primary version and said at least one 
snapshot version of said database; 

a transaction processor for accessing and up- 
dating said primary version pages of said 
dalaK>ase, said primary version pages beir^ 
made available to said transaction processor 
by said intelligent page store; and 

a query processor irxlependent of said transac- 
tion processor for running queries against said 
at least one consistent snapshot version of 
said database, said at least one consistent 
snapshot version of said database being made 
available to said query processor. 

2. A database system as defined in Claim i 
wherein said transaction processor and said 
query processor are different physical entities. 

3. A datatjase system as defined in Clarm 1 
wherein said transaction processor and said 
query processor are independent processes 
Implemented on the same physical entity^ 

4. A database system as defined in Claim i 
wherein said page store maintains more than 



20 7. A database system as defined in Claim 1 hav^ 
ing a database log and wherein said intelligent 
page store derives a consistent snapshot ver- 
sion from said primary version pages and said 

database log. 

8. A database system as defined in Claim 7 
wherein said database log is stored in said 
inteHigem page store. 

ao S. A database system as defined in Claim 1. 

wherein said intelligent page store has a first 
storage space for storing said primary, version 
of each page of said database and a second 
storage space for storing at least one older 

35 version of some pages of said database. 

10. A database transaction arKj query processing 
system comprising: 

40 a transaction processing entity; 

a query processing entity; and 

an intelligent page store having an ordered set 
45 of time intervals, the time interval that indudas 

the current time tjeing the current time Interval, 

said intelligent page store providing non-voia- 
tile storage for a database comprising a plural- 
ao iiy of logical pages, 

each said logical page being associated with 
one or more physical pages. 

33 one of said physical pages associated with 

each of said logical pages being the primary 
physical page and any other physical pages 
associated with each of said logical pages be- 
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ing designated as auxiliary pages. 

each auxiliary physical page being associated 
with a time step. 

5 

said transaction processing entity and said 
query processing entity simultaneously making 
page access requests to said Inteingent page 
store. 

10 

said transaction processing entity page re- 
quests being either read or write requests and 
sejd query processing entity page requests 
being read requests; 

said intelligent page store further comprising: 

means for reluming to said transaction pro- 
cessing entity the primar/ physical page cor- 
responding to a logical page requested by said zo 
transaction processing entity; 



means for reluming to said query processing 
entity in response to a specific query either the 
auxiliary page corresponding to a time interval 25 
associated with said specific query or the pri- 
mary physfcai page if no auxiliary page exists 
that corresponds with the time interval asso- 
ciated with said specific query; 

30 

means for creating a new auxiliary physical 
page wheri said transaction processir>g entity 
updates the associated logical page if no auxil- 
iary physical page associated with that logical 
page exists corresponding the current time^ ss 
step: 

means for creating a new time Interval such 
that the primary and auxiliary pages associated 
with said new time interval correspond to a 40 
logically consistent set of pages of the 
database: and 

means for deleting all of the auxiliary pages 
associated with any time Interval other than the 45 
current time interval after all queries asso- 
ciated with that time interval have been com- , 
pleted- 
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